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Abstract Body 


Background / Context: 

Recent work that has examined the impact of what are variously called periodic, interim, 
benchmark, or diagnostic assessments, typically administered three or four times during a school 
year, has produced mixed findings. For instance, one study reported small significant effects in 
mathematics in grades 3-8, but not in reading (Carlson et ah, 201 1). Other research however, has 
reported significant effects on both mathematics and reading (Slavin et ah, 201 1). Finally, a very 
recent study found no effects on reading achievement in grades 4-5 (Cordray et ah, 2012). 

The state of Indiana was among the first to implement statewide technology-supported 
interim assessments in math and English Language Arts (ELA) to be taken by all K-8 students 
multiple times each school year at volunteering schools. Indiana expects teachers to use 
assessment information to improve ongoing instruction and increase student achievement. In 
2008 the Indiana Department of Education (IDOL) began its roll-out of what it called its 
“diagnostic assessment tools.” 

In 2009-10, the American Institutes for Research conducted the first round of a two- 
cohort randomized controlled trial to evaluate the effectiveness of the interim assessment tool in 
schools receiving it for the first time (Konstantopoulos et ah, in press). Eindings suggested a 
positive but modest treatment effect across all grades. Still, even small positive impacts in the 
first year of an interim assessment intervention are notable, given evidence suggesting that such 
interventions may take multiple years to affect student performance (Slavin et ah, 201 1). 

Eurther, observed effect sizes in the range of 0.10 to 0.19 are of substantive policy interest. 

The theory of action supporting interim assessments’ effectiveness hinges on teachers 
making changes to their instructional practice (Blanc et ah, 2010). In particular, differentiation of 
content scope and sequence, instructional level and grouping methods are among aspects of 
instructional practice theorized to improve quality of instruction by drawing on improved 
information about student needs (Tomlinson 2000). Evidence suggesting small, positive impacts 
in schools’ first year using interim assessments motivates this study’s focus on areas of teacher 
practice hypothesized to be intermediate outcomes of the interim assessment intervention. 

Purpose / Objective / Research Question / Focus of Study: 

This study compares instructional practices of teachers in schools that were randomly 
assigned to receive an interim assessment tool with those of teachers in schools that did not 
receive the tool. Using rich data collected at 16 time points during the school year, we study 
teachers’ self-reported instructional practices to determine whether teachers with access to an 
interim assessment tool alter each of three facets of instructional practice — scope and sequence 
of content coverage, instructional level, and instructional grouping — more than those without the 
tool. Our research questions are: 

(1) Do teachers with access to the interim assessment change the scope and sequence of content, 
and/or vary instructional difficulty level and grouping methods more than those without? 

(2) Do variations in these teacher practices respond to variations in student Acuity performance? 

Setting: 

The data used in this study are drawn from an RCT that took place in Indiana in 2009- 
2010. Schools were randomly identified from a queue of K-8 public schools that had volunteered 
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to implement diagnostic assessments in the Spring of 2009. This set of schools was then 
randomly assigned to treatment or a control (one-year delay in implementation) condition. 

Population / Participants / Subjects: 

Data on instructional practices were collected for and 5* grade teachers as part of the 
RCT described above. This study focuses on 5* grade teachers using the Acuity Predictive 
assessment, a version of the interim assessment aligned to the statewide ISTEP-i- exam 
administered each spring.' Eight students were randomly sampled in each participating teacher’s 
class. Please see Table 1 for details on the study sample in the context of the broader RCT. 

Intervention / Program / Practice: 

As described by Konstantopoulos and colleagues (201 1), the interim assessment tool 
studied here is a series of 30 - 35 item multiple choice tests in mathematics and EEA, 
administered three times during the school year. The tests are closely aligned to Indiana’s 
statewide year-end test, and the intervention provided teachers with rapid access (within 24 
hours) to a variety of class- and student-level reports on performance, including predicted 
proficiency on the year-end test. 

Significance / Novelty of study: 

A small but increasing number of rigorous evaluations of interim assessments exist (May 
and Robinson, 2007; Henderson et ah, 2007; Quint, Sepanek and Smith, 2008; Carlson et ah, 
2011; Slavin et ah, 2011; Cordray et ah, 2012). However, the current study draws on 
considerably richer data on teacher practices than existing impact studies. As described in the 
data collection section, we analyze data from detailed checklists (or “logs”) completed by 
teachers for each of eight students at sixteen time points during the school year. These data 
provide a nuanced picture of instructional practices utilized by teachers with and without the 
interim assessment intervention. By applying existing analytic methods to repeated, detailed 
measurements of teacher practice, we provide new evidence on teacher practices as intermediate 
outcomes responding to interim assessment information in the first year of implementation. 

Research Design: 

We employ treatment vs. control comparisons to explore whether teachers with the 
interim assessment intervention engage in expected instructional practices more than those 
without it. Treatment- group-only analyses of the association between teacher practices and 
student assessment performance provide evidence on the extent to which teachers target 
instruction to student performance. In this analysis, each testing window is considered as a 
juncture at which teachers potentially acquire new information about students. Accordingly, 
teacher change in instructional level is estimated at each assessment window. 

Statistical, Measurement, or Econometric Model 

When comparing Acuity teachers with comparison teachers, we employ hierarchical 
generalized linear models that account for the data’s nested structure, with instructional logs 
(multiple time observations) nested within students, who are nested within teachers, who are in 


* An additional rationale for this focus is that subgroup analyses in the original impact study indicate that measured 
impacts were largest in fifth grade and among Acuity Predictive users. 

^ Please note that outcome measures are described at the end of the Data Collection section. 
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turn nested within schools. We use a logistic model to estimate differences between treatment 
and control teachers on binary measures of instructional practice (described in the Data 
Collection and Analysis section below). We consider teacher decisions such as instructional 
grouping and level of instruction as student-level outcomes measured at each log, because the 
theory of differentiated instruction implies that teachers make distinct instructional decisions for 
each student. In contrast, we model curricular decisions, such as whether a given topic is covered 
on a given day, as class-level phenomena, since these are made for the whole class. We model 
the parameter of interest, the treatment-control contrast, as a fixed coefficient and account for the 
nested data structure using random effects at the student, teacher and school level. The resulting 
model can be expressed as follows: 

Pr(r-yfct=o) “ ^0 

where Yij^t is the student-level observation of teacher practice, Di^ indicates whether a school 
received the Acuity Predictive interim assessment tool, is parameter of interest measuring 
the treatment-control contrast, Uj^, Vji^, and Wjy^., represent school, teacher, and student random 
effects, respectively.^ This model is estimated for the full sample including all logs for each 
student, as well as for subsamples including only the logs following each interim assessment 
window. 

When analyzing associations between student Acuity performance and teacher practices, 
we consider a “differences in differences” specification in which teacher practice prior to each 
test acts as a counterfactual for practice following the test window. The first difference in the 
model is an average difference between students in the top and bottom half of their class 
sample’s performance on the Acuity test (for example, a difference in share of students 
experiencing remedial instruction). If the contrast between top- and bottom-half performers 
grows following the test window, this is consistent with the hypothesis that teachers change their 
instructional practices based on new information from the Acuity assessment. This model can be 
expressed as follows: 

Pr(ryfct=0) “ ^0 + S(Ti X Pf-) +Uj^+ Vj^ + Wiji^ , (2) 

where Yij^t is the student-level observation of teacher practice, 5j is an indicator taking “1” 
when a student is in the bottom half of his class-sample’s Acuity performance for a given test 
window, Pf is an indicator taking “1” in the period following the test window and “0” before, 5 
is the parameter of interest measuring the difference in differences, and the last three terms are 
random effects as described in Equation 1. This model is estimated for two subsamples 
corresponding to the two Acuity assessment periods for which there are pre- and post-test data. 
Each subsample includes four instructional log dates, two before the test window and two after. 

Usefulness / Applicability of Method: 

The usefulness of the methods applied in this study is described in the significance section. 

Data Collection and Analysis: 


^ In analysis of topics covered by teachers as an outcome (described briefly in the “Data Collection and Analysis” 
section), the teacher-date, rather than the student-date, is considered as an observation, because topic -level content 
coverage decisions are conceived as applying to the whole class. Accordingly, a three -level model is used, with no 
teacher-level random effect specified. 
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Teachers in control and treatment schools in grade 5 were asked to complete 16 
instructional checklists throughout the school year, roughly one every two weeks. Our staff 
developed a separate checklist for math and ELA. The ELA checklists were based on Rowan and 
Correnti’s checklist (2009). The mathematics checklists were developed by content experts, 
following the EEA model and guided by the Indiana mathematics standards. In each checklist, 
items were categorized by topic area. The math checklist had seven topic areas: number sense; 
computation; algebra and function; geometry; measurement; problem solving; and data analysis 
and probability. Each topic area contained items related to teacher instruction, concepts and 
skills, and student activities. The EEA checklist contained nine topic areas, and collected the 
same detailed instructional information as the math logs on five of these: comprehension, 
writing, word analysis, reading fluency, and vocabulary. Teachers completed checklists online 
and results were stored on servers. 

Eollowing procedures described by Rowan and Correnti (2009), eight students were 
randomly selected by each teacher to focus on while completing the checklist. These same eight 
students were used for the entire year. Eor each checklist date, teachers indicated whether each 
student was instructed in each topic and whether they used a particular instructional grouping 
method with each student. If they had taught particular content, they indicated whether they had 
taught that student at the remedial, regular, or enriched level. 

Using data collected by teacher checklists, we developed binary measures indicating 
whether a student experienced relevant instructional practices on a given day. A series of binary 
variables - one for each topic area - indicate whether a student received instruction on each topic 
on that day. A second binary variable indicates whether a student received any remedial or 
enriched instruction that day. A third binary variable, used in the difference in differences model 
described above, measures whether a student received remedial instruction on a given day. 
Einally, two binary variables indicate whether a student received instruction in a small-group or 
individual format that day. 

Findings / Results: 

Initial findings suggest little evidence of strong impacts on teacher practice as a result of 
access to the Acuity Predictive interim assessment tool.^ The time series presented in Eigures 1-4 
show some periods where Acuity teachers sustain higher levels of engagement in specific 
instructional practices than comparison teachers. However, these selective periods do not cohere 
into a broader pattern of Acuity teachers using expected practices more widely than comparison 
teachers. Panel A of Table 2 presents estimates of the treatment control contrast estimated using 
a three-level adaptation of Equation 1 over all logs; statistically significant contrasts do not 
emerge in any of the seven content areas. While estimates of the treatment vs. control contrast in 
use of individual and small group instruction (Panel B) are both positive and of substantial 
magnitude, they are also not statistically significant. The estimated difference in levels of 
targeted (enriched or remedial) instruction in Panel C similarly indicates that a lack of significant 
difference between the groups, although the estimate’s sign and magnitude suggest that Acuity 
teachers may increase levels of targeted instruction. Results from the difference in differences 
models (Equation 2 above) are not presented here but are broadly confirmatory, characterized by 
mixed signs and few significant estimates. 

Conclusions: 


Results discussed in this abstract are summary in nature and only consider mathematics instruction. 
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We report results from rich data on teacher instructional practices generated at sixteen 
intervals by teachers with and without access to a specific interim assessment tool. Estimates 
provide no strong evidence that teachers change the instructional practices measured here in 
response to Acuity performance data. One possible reason for these findings is that Acuity is not 
a unique intervention, and a significant number of control teachers reported using other interim 
assessment tools. Another possible explanation for these results is that the relatively small 
sample of teachers completing checklists harms the study’s power. Finally, these results pertain 
to the first year of the intervention, when teachers are likely still learning how to use the 
assessment tool and integrate it into their instructional practice. Future research should explore 
the hypothesis that impacts on teacher practice grow over time as teachers learn to use the 
assessment tool. 
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Table 1. Sample Size, Full RCT Sample and Subsample with Instructional Log Data 


Study Samples 

Grade 5 

All Acuity, 

Acuity Predictive 

Acuity Predictive Users, 

Observations 

Full RCT 

Users, Full RCT 

Log Data Collected 



All 

All 

T 

C 

All 

T 

C 

Schools 

56 

29 

19 

10 

22 

12 

10 

Teachers 

148 

87 

57 

30 

52 

27 

25 

Students 

3,711 

1,962 

1,233 

729 

416 

216 

200 


The right-most three columns report schools, teachers and students in the present study sample. 
Table 2. Treatment vs. Comparison Contrast in Three Measures of Instructional Practice 


Area of Instructional Practice 

T reatment-Control 
Contrast 
(Odds Ratio) 

T reatment-Control 
Contrast 

(Logit coefficient) 

Standard _ 

Error 

Panel A. Content Coverage 

Number Sense 

1.01 

0.01 

736 

(0.37) 

Computation 

1.06 

0.06 

(0.27) 

Algebra and Functions 

1.25 

0.23 

(0.41) 

Geometry 

0.93 

-0.07 

(0.24) 

Measurement 

1.09 

0.08 

(0.30) 

Problem Solving 

1.05 

0.05 

(0.28) 

Data Analysis and Probability 

1.03 

0.03 

(0.32) 


Panel B. Instructional Grouping 

Methods 5450 


Small Group Instruction 

1.42 

0.35 

(0.52) 

Individual Instruction 

1.29 

0.25 

(0.94) 

Panel C. Instructional Difficulty 




Level 



4662 

Received at least One Concept at 
Enriched or Remedial 

1.41 

0.35 

(0.80) 

* p < 0.05 
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Geometry (%) _ Number Sense (%) 


Figure 1. Average Levels of Math Content Coverage in Seven Content Areas, 9/2009-5/2010 


Math Content Coverage 






Log Date 



Log Date 




01oct2009 01dec2009 01feb2010 01apr2010 01jun2010 

Log Date 


Control 


Treatment 


Sample size = 24 (C), 24 (T) 
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Share of Students e Share of Students 


Figure 2. Average Levels of Small Group Instruction in Math, 9/2009 - 5/2010 


Share Receiving Small Group Instruction 



Control 

Treatment 


3. Average Levels of Individual Instruction in Math, 9/2009 - 5/2010 


Share Receiving Individual Instruction 



Control 

Treatment 
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Average Student Rate of Targeted Level 


Figure 4. Average Student Share of Topics at Remedial or Enriched Level, 9/2009 - 5/2010 


Rate of Targeted Level, Math Predictive Sample 



Instructional Log Date 


Control • Treatment 
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